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Summary 



What energetic and solvation effects underlie the remarkable two-state thermodynamics 
and folding/unfolding kinetics of small single-domain proteins? To address this question, 
we investigate the folding and unfolding of a hierarchy of continuum Langevin dynamics 
models of chymotrypsin inhibitor 2. We find that residue-based additive Go-like contact 
energies, although native-centric, are by themselves insufficient for proteinlike calorimet- 
ric two-state cooperativity. Further native biases by local conformational preferences 
are necessary for proteinlike thermodynamics. Kinetically, however, even models with 
both contact and local native-centric energies do not produce simple two-state chevron 
plots. Thus a model protein's thermodynamic cooperativity is not sufficient for sim- 
ple two-state kinetics. The models tested appear to have increasing internal friction 
with increasing native stability, leading to chevron rollovers that typify kinetics that are 
commonly referred to as non-two-state. The free energy profiles of these models are 
found to be sensitive to the choice of native contacts and the presumed spatial ranges of 
the contact interactions. Motivated by explicit-water considerations, we explore recent 
treatments of solvent granularity that incorporate desolvation free energy barriers into 
effective implicit-solvent intraprotein interactions. This additional feature reduces both 
folding and unfolding rates vis-a-vis that of the corresponding models without desolva- 
tion barriers, but the kinetics remain non-two-state. Taken together, our observations 
suggest that interaction mechanisms more intricate than simple Go-like constructs and 
pairwise additive solvation-like contributions are needed to rationalize some of the most 
basic generic protein properties. Therefore, as experimental constraints on protein chain 
models, requiring a consistent account of proteinlike thermodynamic and kinetic cooper- 
ativity can be more stringent and productive for some applications than simply requiring 
a model heteropolymer to fold to a target structure. 
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INTRODUCTION 



A fundamental unresolved question in molecular biology is how solvent-mediated in- 
teractions conspire to produce the highly specific structures and dynamics of proteins. 
Recent experiments on highly cooperative "two-state" folding/unfolding kinetics of small 
single-domain proteins^"^ have, however, revealed an intriguing phenomenological sim- 
plicity. Most notably, the folding rates of these proteins are found to be well correlated 
with a simple contact order parameter deducible entirely from the native contact pat- 
tern, often referred to as a protein's "topology." 

Prom a reductionist viewpoint, protein behavior is ultimately determined by the large 
collection of atoms of the protein and those that constitute the solvent. Indeed, com- 
putational studies of protein folding by all-atom simulations have led to many useful 
insights.^~^° As well, recent developments in Monte Carlo conformational samphng tech- 
niques are promising (see, e.g., ref. 10). However, currently even the most extensive 
all-atom explicit-solvent simulations of proteins^'^ do not appear to provide sufficient 
conformational coverage to tackle many equilibrium and long-timescale kinetic proper- 
ties under ambient conditions. Thus, a long-standing^^~^^ complementary approach is 
to adopt simplified lattice^^"^^ or continuum^^"^^ representations of polypeptide geome- 
tries and interactions, trading high structural resolution for the extensive conformational 
samphng that is often necessary for conceptual advances. 

Considerable progress has been made using simplified representations. But still it has 
been exceedingly difiicult to fold a heteropolymer chain to the native structure of a real 
protein by a potential function determined solely by the chain's one-dimensional amino 
acid sequence. In this context, the recent discovery of certain predictive powers of native 
topology^ has inspired intense interests in native-centric modeling. These models are 
often called Go or Go-Iikc since they make no explicit reference to a protein's sequence. 
Instead, telcological''^'^* interaction schemes are postulated to bias conformations toward 
a given native structure, in a manner similar to the original lattice constructs of Go and 
coworkers. ^^'^^ 

Remarkably, despite their apparent simplicity and artificiality, tremendous advance 
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has been made by recent innovations in native-centric modeling. This investigative ap- 
proach has proven to be a very effective tool to gain understanding in the face of difficul- 
ties encountered by more reductionist approaches. Through the questions they posed, 
critical physical insights have been gained into the thermodynamics, ^'''^'^''^'^''^^''^^"^^'^^ fold- 
ing kinetics, ^''^■^^''^^^'^^ folding rates, and transition states and folding interme- 
diates^^'^^''^^''^^''^^'^^'^'^'*^^ of proteins. For example, physical rationalizations of the observed 
relationship between contact order and folding rate have been provided by Ising-type 
native-centric models without an explicit chain representation^^"^^ as well as explicit 
Go-like chain models. In separate efforts, using a protein's database native structure 
as starting point, Gaussian elastic network^^'^^ and graph theoretic^^ models have been 
notably successful in deciphering the flexibility and vibrational modes of real proteins 
without explicitly considering the myriad intraprotein interactions involved. These mod- 
els of near-native dynamics do not originally tackle protein folding/unfolding kinetics. 
But in a recent generalization of the Gaussian elastic network model that took into 
account chain connectivity, a signiflcant correlation between experimental folding rates 
and the relaxation rates of the slowest vibrational modes was discovered, suggesting 
an intimate connection between near-native vibrations and folding/unfolding kinetics. 

Some of the successes of native-centric approaches have been attributed to the stipu- 
lation^^'^^ that Go-hke potentials are proteinlike to a certain degree because they serve to 
eliminate "to flrst order" or to minimize^^ the "energetic frustration" that is presumed 
to be minimized in real proteins. ^"^^^^ According to this view, native-centric models are 
thus left free to account for "topological frustration" ^^'^^ alone — i.e., to capture the 
physics arising solely from chain connectivity, excluded volume, and the favorability of 
the native fold. 

While we cherish the successes of the native-centric paradigm, it is also important 
to not lose sight of its limitations. In short, native-centric modeling entails: (i) Admit- 
ting our ignorance of the basic physics of protein folding, at least for the time being, 
(ii) Recognizing that a protein sequence's known native structure may contain signif- 
icant information about its actual energetics, (iii) Assuming that a Go-like potential 
inferred from the known native structure is in some sense an approximate description of 
the underlying physical energetics, (iv) Working out the logical consequences of these 
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assumptions to gain insight into various aspects of the folding process. In this perspec- 
tive, native-centric modehng should be taken as a tentative means to capture collective 
atomic/molecular effects that we don't yet understand. As such, its application may 
be physically more meaningful at a coarse-grained level (perhaps as a "renormalized" 
description). Although all-atom Go- like models^^'^^ are obviously superior in accounting 
for the important effects of sidechain packing (see. e.g., insightful discussion in ref. 46), 
physically it is even harder to justify why the interaction between a pair of atoms in gen- 
eral would depend on whether they are in close spatial proximity in a particular protein's 
native structure. Historically speaking, the renewed popularity of Go-like models in pro- 
tein folding studies since the late 1990s may arguably represent a partial backtracking 
— albeit a very productive and well justified one — in modeling philosophy. This is so 
because the desire to supersede these earlier ad hoc interaction schemes appeared to be 
an impetus for the emergence since the late 1980s of simplified lattice protein folding 
models with general sequence-dependent potentials. 

Here we endeavor to better delineate the utility and limitation of several common 
native-centric approaches to protein folding. In identifying their strengths as well as 
weaknesses, our goal is to pave the way for improved native-centric modeling and better 
reductionist approaches. It goes without saying that Go-like models are intrinsically in- 
complete, because (i) possible nonnative interactions are in large measure neglected, ^^'^^ 
and (ii) interaction (energetic) heterogeneities can be present in proteins with essen- 
tially identical topologies. Practically speaking, even if a general usefulness of native- 
centric modeling is presumed, robustness of the predictions has to be ascertained. Many 
native-centric interaction schemes can be postulated; not all of them have the same pre- 
dictions. Some discrepancies are puzzling. For example, the combination of a Go-like 
potential with an explicit chain representation should, in principle, be a more adequate 
model of topological frustration^^'"^^ than models without an explicit chain representa- 
tion. Yet so far native-centric constructs with geometrically less realistic (non-explicit) 
chain representations^^"^^'^^ appear to be more successful in reproducing experimental 
folding rates than direct folding kinetics simulations of Go- like models with explicit chain 
represent ations . 

Specifically, the present work addresses two basic questions of robustness in native- 
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centric modeling: (i) How much do the model predictions depend on the choice of native 
contacts for a given protein? (ii) To what extent would these predictions be modi- 
fied when the effects of the protein's aqueous solvent are taken into account with more 
sophistication?"^*^'^^ Pursuing a line of inquiry we have recently developed in the context 
of lattice models/^'^^'^^'®^'®^ we focus here on whether continuum coarse-grained Go-like 
energetics with an explicit chain representation can reproduce certain generic thermody- 
namic and kinetic cooperativities that have been experimentally observed across many 
real proteins. These statistical mechanics tests are stringent. For instance, the mere 
existence of a qualitatively sharp folding transition in a chain model does not necessar- 
ily imply that its underlying thermodynamics is proteinlike. ^^'^^'^^'^^ Homopolymers can 
have very sharp coil-globule transitions that are not calorimetrically two-state. Com- 
parisons between simulated and experimental chevron plots^° show that even for chain 
models that satisfy the experimental thermodynamic two-state criteria, it is nontrivial^^ 
to reproduce the highly cooperative nonglassy two-state folding kinetics^^ of many small 
single-domain proteins. Therefore, applying these tests would, in due course, facilitate 
the improvement of existing models, suggest yet unexplored avenues of native-centric 
topological modeling, and ultimately help decipher the energetics of real proteins. 



COMPARING DIFFERENT NATIVE CONTACT SETS FOR CI2 

We consider the 64-residue truncated form of chymotrypsin inhibitor 2 (CI2)^^ using 
coarse-grained representations with sidechain interactions accounted for by contacts 
between pairs of Cq, positions separated by at least three C^s along the chain sequence 
(contact order > 4). CI2 is a widely studied small single-domain protein with no disulfide 
bond. It folds and unfolds as an apparent simple two-state system. CI2 is an ideal test 
case because a large body of experimental, all-atom molecular dynamics, and native- 
centric modeling data is available (see, e.g., refs. 32, 36, 73. 74). To investigate how 
coarse-grained native-centric model predictions may be sensitive to the definition of na- 
tive contacts, here we examine two native contact sets, which we refer to as NCSl and 
NCS2. 

NCSl is determined by the distance criterion of Shea et al:^^ Two amino acid residues 
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i and j of a given protein are in contact if, in its native structure from the Protein Data 
Bank (PDB), either their Cq, atoms are less than SA apart, or any two heavy atoms one 
from each of their two sidechains are less than 4A apart, or both. Using this definition, 
there are 137 NCSl contacts. NCS2 is borrowed from Clementi et al.'s native contact 
map for CI2 (Figure 2 of ref. 32). NCS2 has 142 contacts. It was based upon the CSU 
software''^ which takes into account more detailed structural information!*! such as con- 
tact surface area and solvent accessibility. There are considerable variations in native 
Cq,-Cq, distances among contacts in both NCSl and NCS2. The minimum native contact 
distance is 4.325 A for both sets, but the maximum are 12.255 A and 15.558 A for 
NCSl and NCS2 respectively. The average native distance of NCSl (6.528 A) is smaller 
than that of NCS2 (7.288 A). However, the average sequence separations of NCSl (23.1) 
and NCS2 (22.6) are almost identical. 

Figure 1 compares the two native contact sets. They have 108 contacts in common 
(blue lines in Figure lb). Among the native contacts that are not common to both 
sets, those belonging to NCSl but not NCS2 (green lines in Figure Ic) tend to be be- 
tween two ends of the chain or involve the (31 strand (residues 27-34).^^ In contrast, 
contacts belonging to NCS2 but not NCSl (red lines in Figure Id) appear to be more 
uniformly distributed, involving more the a-helix (residues 13-23) and the region span- 
ning residues 35-44. Specific examples of these differences are provided in Figure 2, 
showing that NCSl identifies an hydrophobic-polar (alanine-arginine) contact but not 
an hydrophobic- hydrophobic (valine-phenylalanine) contact. 



MODELS AND METHODS 

Coarse-grained Potentials Without Solvation/Desolvation Barriers 

The basic construct of our native-centric potentials follows that of Clementi et al.'^^ 
For a given model protein conformation specified by the positions of all its Cq, atoms, 

*For CI2, the current version of the CSU software available from the Internet also produces a set of 
142 native contacts, all except 8 of which are identical to the contacts in NCS2. For the computational 
tests we have conducted (detailed data not shown), differences in results for this particular CSU native 
contact set and that for NCS2 are negligible. 
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where N is the total number of particles. This functional form has also been used by 
Koga and Takada.^^ Here the first three summations are for local interactions, where r, 9, 
and (f) are, respectively, the Cq-Cq, virtual bond length between successive residues along 
the chain sequence, Cq-Cq, virtual bond angles, and Cq-Cq virtual torsion angles; ro, ^o, 
and 00 are the corresponding native values in the PDB structure. These terms account 
for chain connectivity and presumed local conformational preferences for the native fold. 
The last two summations are for nonlocal interactions; rij is the spatial distance between 
two CaS that have at least three residues between them along the chain sequence. In the 
summation over native contacts (as defined above for either NCSl or NCS2), a 10-12 
Lennard Jones (LJ) form is used, where r'^j is the 0^-0^ distance between the contacting 
residue i and residue j in the PDB structure. In the summation over non-native contacts, 
Trep parametrizes the excluded volume repulsion between residue pairs that do not belong 
to the given native contact set. As in refs. 32 and 35, we use rrep = 4 A (whereas ref. 28 
uses Trep = 7.8 A). The ratios between interaction parameters are = lOOe, Kg = 20e, 
K^^ = e, and Kjp^ = 0.5e, as in ref. 32. The interaction strength is thus controlled by a 
single parameter e. We refer to the potential just described as the "without-solvation" 
model because it does not have a solvation/desolvation barrier (sec below), although the 
terms in equation 1 may be interpreted as part of an implicit-solvent scheme that takes 
into account other aspects of solvent-mediated interactions. 



Equation 1 assumes that native-centric favorable interactions have relatively long 
spatial ranges. In alternate square-well Go models, however, favorable contact inter- 
actions have sharp cutoffs. Moreover, in many lattice models, contact interactions may 
be viewed as having infinitesimal spatial ranges. Thus, to investigate how the presumed 
spatial ranges of contact interactions may affect model predictions, we study a variation 
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of the above model that restricts each of the pairwise 10-12 LJ native contact terms in 
equation 1 to Vij < l-2r^ and sets the interaction to zero for Vij > l-Sr^^-, but all other 
aspects of the model stay the same. We call this the "without-solvation-SSR" (short 
spatial range) model. 

An Approximate Account of Solvation/Desolvation Barriers 

We consider also coarse-grained "with- solvation" models designed to semi-quantitatively 
account for the solvation/desolvation free energy barriers encountered by a protein's 
constituent groups as they cluster together in aqueous solvents (Figure 3). We refer 
to these barriers simply as "desolvation barriers" below. Desolvation barriers are a ro- 
bust consequence of granularity or the particulate nature of the solvent. They have 
long been predicted by theory^^ and atomic simulations. ^^'^^ However, aside from an ear- 
lier study that used a square- well/square-shoulder form of desolvation barriers, until 
very recently^°'^°'^^ this salient physical feature was not taken into account in continuum 
coarse-grained protein models. While explicit-solvent molecular dynamics account for 
solvation effects directly, these simulations do not yet provide a definitive answer as to 
whether they can or cannot reproduce the experimentally observed thermodynamic and 
kinetic cooperativities in protein folding. Therefore, complementary "implicit-solvent"^^ 
treatments^'^'^^'^^'^'^ like the present approach are needed. Indeed, the experimentally 
based cooperative tests conducted here should also be applied to all-atom models once 
their computational efficiency has improved to make it possible. 

The scope of the present work is limited. In particular, the study of structural details 
— such as connections to the powerful experimental $-value analysis of transition-state 
structures,^''''^''''^ is deferred to future applications of our investigative framework. We first 
tackle a little-explored but fundamental question: How deeply are protein folding ther- 
modynamic and kinetic cooperativities affected by the introduction of generic desolvation 
barriers? To this end, we employ the general implicit-solvent functional form introduced 
recently by Cheung et al."**^ (Figure 3). The repulsive part of this potential (for r < r') is 
similar, though not identical, to the repulsive part of the 10-12 LJ term in the without- 
solvation model above (equation 1). The key difference is that now a free energy barrier 
is present at the midpoint (r' + r")/2 between the contact (r') and water-separated (r") 
free energy minima of a given pairwise interaction; r" — r' = 3.0 A is the approximate 
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diameter of a single water molecule. Shown in Figure 3b(i) is a potential with relative 
magnitudes of the barrier and minima similar to that in ref. 40. This form has a rel- 
atively high desolvation barrier.0 The U{r) function in the present study has a lower 
barrier (Figure 3a). As our goal is only to elucidate the generic implications of having a 
significant desolvation barrier, provided that the barrier is not negligible, a lower barrier 
is advantageous because it allows for faster kinetics and thus broader conformational 
sampling. Not the least, our choice is not inconsistent with recent explicit-water atomic 
simulations that predicted a lower pairwise desolvation barrier^^ [Figure 3b(ii)]. Now, for 
the with-solvation model, we simply replace the pairwise 10-12 LJ terms of the second 
last summation over native contacts in the Vtotai equation 1 above with f/(r)s (Figure 3a) 
for the corresponding native pairs. Other terms in equation 1 remain the same. We call 
the resulting potential function V^otai- Again, the interaction strength of a given model 
is controlled by one single parameter e. In principle, terms in both the without-solvation 
and with-solvation potentials representing solvent-mediated interactions can depend on 
temperature. To simplify the formulation, however, and especially since most of the 
results in this report entail comparing kinetic trajectories under a constant given temper- 
ature, here Vtotai and U (r) are taken to be temperature independent, as in refs. 32 and 40. 

Langevin Dynamics 

Folding and unfolding kinetics are simulated by Langevin dynamics,i using a formu- 
lation similar to Thirumalai and coworkers'. ^^'^^ For each of the 3A^ degrees of freedom 
of the model protein {x, y or z coordinates of the Cqs), the equation of motion is: 

mv{t) = Fconf (t) - m^vit) + r]{t) , (0.2) 

where m, v, i), Fconf , 7 and rj are, respectively, mass, velocity, acceleration, conformational 
force, friction (viscosity) constant and random force. The conformational force is equal 
to the negative gradient of the total potential energy of the given model (Vtotai or V^otai)- 
For the without-solvation-SSR models, conformational force from the pairwise 10-12 LJ 
native contact term in Vtotai between residues i and j is applied only if r^j < l-2rj'j-. The 

^In order not to have a negative e"/e ratio, it appears that the relation (e" — e')/{e' ~ e) — 1.33 in 

the legend for Figure 1 in Cheung et al.""^ should read (e" + e')/(f' ^ ^) — 1-33. 

■I- Alternately, Newtonian dynamics in conjunction with the Berendsen et al. algorithm^^ for coupling 

to a heat bath was used by several previous investigations"^^'"^'"' of similar Go-like coarse-grained protein 

models. 
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random force has the autocorrelation function 

iviMt')) = 2m7 k^T 5(t - t') , (0.3) 

where is Boltzmann constant times absolute temperature. Every Cq, is subject to 
a random force at each integration time step. The components of the random force 
are independently generated by setting r]i = (2m^k'QT / StY^'^^i. Here i denotes the 
uncorrelated random force components in the x, y or z directions, is a random value 
taken from a Gaussian distribution with zero mean and unit variance (obtained from 
a random number generator by standard techniques^''), and 6t is the integration time 
step. At the commencement of a simulation at temperature T, the initial velocities are 
assigned random values by setting Vi — {ksT /mY^'^^i. 

We use the velo city- ver let algorithm^^"^^ (equations 12 and 13 in ref. 89) to integrate 
equation 2. Independent of simulation conditions such as variations in e and T, the time 
scale of the model systems here is always controlled by the quantity r = ^ma^/eo, with 
the length scale a — A A and a reference energy scale eo = 1. We further set 7 = O.OSr"^ 
and use a molecular dynamics time step 5t — O.OOSr in the numerical integration. Con- 
formational sampling is performed by averaging over snapshots taken at every 400 time 
steps. Simulation times in this study are presented in units of St. The energy param- 
eter e and temperature T are given respectively in units of eo and eo/k-Q, and length is 
measured in units of A. To simplify notation, other units are chosen such that m — 1 
and /cb = 1 in the present simulations, as in Veitshans et al.^^ An approximate corre- 
spondence between model time and real protein kinetic time scales can be found in ref. 89. 



THERMODYNAMIC COOPERATIVITY 

Free Energy Profiles in Different Native- Centric Schemes 

Using the progress variable Q (native contact fraction), Figure 4 shows that confor- 
mational distribution is significantly sensitive to the choice of native contact set and the 
presumed spatial ranges of native contact interactions. Consistent with the expectation 
for a two-state protein such as CI2 and a previous without-solvation study, "^^ the free 
energy profiles for NCS2 (solid curves) exhibit a single peak at intermediate Q sepa- 
rating the native (high Q) and denatured (low Q) minima. In contrast, the NCSl free 
energy profile has a plateau-like transition region in the without-solvation formulation 
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(Figure 4a, dashed curve). More remarkably, for the without-solvation-SSR and with- 
solvation models (Figure 4b, c), the NCSl profiles develop a shallow minimum flanked 
by two peaks in the intermediate Q region (dashed curves), similar to certain postulated 
free energy profiles discussed previously, for example, by Fersht^^ and Chu and Bai,^'^ in 
the context of folding kinetics that apparently involves intermediates. Also notable is 
the progressive movement of the native minimum position from Q ^ 0.9 for the without- 
solvation models toward Q = 1 for the with-solvation models. The incorporation of 
desolvation barriers dramatically raises the overall folding/unfolding free energy barrier 
for NCS2, but only has a relatively subdued effect for NCSl (c.f. Figure 4b, c), sug- 
gesting that there is an intricate interplay between desolvation barrier effects and other 
aspects of solvent-mediated interactions in protein folding. 

Calorimetric Cooperativity: Local Conformational Preferences are Cru- 
cial 

Figure 5 assesses the calorimetric cooperativity^^'^^'^^'^^ of seven different native- 
centric models of CI2 by comparing their simulated van't Hoff over calorimetric enthalpy 
ratios AHy^./ AHcai to the experimental two-state requirement that Aifyn/AiJcai ~ 1- 
Model intraprotein interactions are taken to be temperature independent in this evalua- 
tion. Since vibrations along the bonds (equation 1) contribute to heat capacity in these 
models outside the folding/unfolding transition region, and there is experimental evi- 
dence for heat capacity contributions from bond vector motion in real proteins,^^ the sim- 
ulated AHvu/ AHcai ratio without baseline subtractions does not correspond physically to 
the experimental AH^n/ AHca\ ratio obtained by empirical baseline subtractions. ^^'^^'^''''^^ 
Thus, only the baseline-subtracted AH^n/ AHcai ratio from the models are judged 
against the experimental calorimetric two-state criterion. ^^'^^'^^ Figure 5 shows that 
AHyu/ AHcai = ~ 1 is satisfied by all six models described in the last section. 
Apparently, similar Go-like models in refs. 32 and 40 also exhibit calorimetric cooper- 
ativity. This is evident from their reported heat capacity scans although AH^n/ AH(^i 
ratios were not computed in these works. 

The role of local interactions is addressed by a different coarse-grained model with 
Go-like (through-space) contact interactions but very little local (through-bond) prefer- 
ence for the GI2 native structure. The setup of this "contact-dominant" model is similar 
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to that of the NCS2 without-solvation-SSR modeh It has the same virtual bond strength 
{Kr = lOOe), but the local native preference is weakened by a factor of 20, i.e., Kg = 0.5e, 
K^j^^ = 0.05e, and K^^ = 0.025e. Folding in this model is clearly non-two-state. In our 
simulation of this contact-dominant model, conformations very close to the target native 
structure were observed but Q = 1 was not achieved.! A numerical estimate of this 
model's heat capacity function was obtained from Langevin dynamics simulation near 
the transition midpoint. Figure 5 shows that it has a double hump, which is clearly 
dissimilar to the single-peak heat capacity scans of two-state proteins such as CI2.^^'^^ 
Moreover, near this model's temperature for the peak heat capacity value, the distri- 
bution of Q has only a single population maximum rather than being bimodal (data 
not shown). Indeed, a few highest and lowest Q values were so improbable that they 
were not sampled. These thermodynamically non-cooperative features are reflected by 
an exceedingly low van't Hoff over calorimetric enthalpy ratio of k,2^ = 0.33. 

One may conceivably argue from the "energetic vs. topological frustration" perspective' 
that energetic frustration has already been eliminated in the contact- dominant model be- 
cause its potential favors native contacts, disfavors nonnative contacts, and even slightly 
favors native bond angles and torsion angles. Yet the contact-dominant model's ther- 
modynamics is not proteinlike. The non-cooperative behavior of this particular contact- 
dominant model might have been exasperated by the exclusion of {i, i + 3) contacts in 
its formulation (see equation 1). Nonetheless, the present result echoes several recent 
findings of less-than-proteinlike thermodynamic cooperativity in continuum models with 
Go-like contact interactions but without local conformational preferences. These model 
studies include coarse-grained and all-atom discontinuous molecular dynamics models"^^'^^ 
as well as a self-consistent field theory. On the other hand, some three-dimensional lat- 
tice "contact-only" Go models are thermodynamically cooperative,^"^ probably because 
of default lattice restrictions on bond angles and torsion angles. However, in continuous 
space, the "negative design" afforded by Go-like contact interactions alone are appar- 
ently insufficient for proteinlike thermodynamics. Indeed, a protein sequence's ability 
to fold to a unique structure may be partially encoded in local signals. ^^'^^ Proteinlike 

§We have also studied similar "contact-only" models with Kg — k'"^^ — k'"^^ = in the same 
without-solvation-SSR setup as well as in the (full LJ) without-solvation formulation. These models 
have even bigger difficulties reaching conformations with Q w 1 than the contact-dominant model. 
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behavior requires minimization of energetic frustration of the target native structure as 
well as enhanced frustration in the competing nonnative conformations.^^ A comparison 
between the contact-dominant model and the other models with local native propen- 
sities in Figure 5 suggests that an interplay between local conformational preference 
and nonlocal compactification forces^''''^^'^'''^^'^^'^^^ are necessary for proteinlike thermo- 
dynamic cooperativity. For this conclusion to be properly interpreted, we hasten to add 
that structural details of sidechain packing, hydrogen bonding, as well as general non- 
native-centric physical restrictions on bond angles and torsion angles (as in standard 
non-Go-like force fields) have not been taken into account in the present coarse-grained 
(residue-based) contact-dominant model. But these effects are operative in real proteins. 
Clearly, these interactions must be part of the physical basis of any local propensity^°^ 
for the native fold in a more complete all-atom description. 



KINETIC COOPERATIVITY 

Sharp Kinetic Transitions Between Two Thermodynamic States 

Folding kinetics in explicit-chain Go-like models have been investigated using equilib- 
rium sampling in conjunction with free energy profile analyses^^ as well as direct dynam- 
ics simulations.^^ Here, around their respective transition midpoints, all six native-centric 
CI2 models — NCSl or NCS2, with or without solvation — have kinetic characteristics 
consistent with their thermodynamic two-state cooperativity. Figure 6a and b show 
that the kinetic transitions between the native and denatured ensembles are sudden and 
sharp. Figure 6c and d show that the distributions of potential energy and Q are bi- 
modally well separated into native and denatured regions, and the correlation between 
potential energy and Q is generally linear. A consistency check has also been made using 
Figure 6c, which provided an average kinetic energy of 78.9. Equating this with 3NT/2 
for = 64 (equipartition theorem) yields T = 0.8219, which is essentially identical to 
the input simulation temperature of T = 0.82, as it should. Figure 6c and d further indi- 
cate that after the initiation of folding around the transition midpoint, pre-equilibration 
of the denatured ensemble is rapid relative to the folding time scale. 

Chevron Plots: Matching Kinetics with Thermodynamics? 
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Bearing in mind that protein thermodynamic cooperativity is necessary but not suf- 
ficient for simple two-state folding/unfolding kinetics, we proceed to evahiate model 
predictions against experimental stability curves and chevron plots. To do so, we deter- 
mine model folding and unfolding rates using direct dynamics simulations over extensive 
ranges of native stability by varying the interaction parameter e at constant temperature. 
Since the simulated kinetics are essentially single-exponential (see below), the folding or 
unfolding rate may be taken to be approximately the reciprocal of the corresponding 
mean first passage time (MFPT). The natural logarithms of the rates are plotted as 
functions of e in Figures 7-9. Inasmuch as it was computationally feasible, first passage 
times (FPTs) of a large number of trajectories were used to provide rehable estimates 
of MFPTs (Tables 1-3). As one of us has argued,^^'^'^^ the variation of e may serve 
as a tentative model for varying denaturant concentration, though the detailed physics 
of how the effects of chemical denaturants should be incorporated into coarse-grained 
protein models is a subject of ongoing research. Here we view the upper panels of 
Figures 7 - 9 as model equivalences of chevron plots. 

Native stability curves of the models as functions of e are plotted in the lower panels of 
Figures 7-9. They show that the free energy of unfolding between the native minimum 
and low-Q open conformations are approximately linear in e (upper solid and dashed 
curves). These quasi-linear stability curves estimated from simulation data around the 
transition midpoint correspond to those obtained experimentally by empirical linear ex- 
trapolation from directly measured data around the transition region. In contrast, 
the free energy difference between the native minimum and a denatured-state ensemble 
encompassing \ow-Q as well as intermediate-Q conformations (lower solid and dashed 
curves) is nonlinear in e, similar to that observed in previously lattice model studies. ^^'^^ 
This is an expected feature^^'^^ intimately connected to the multiple-conformation na- 
ture of the native state,^'''^^ and is consistent with recent native-state hydrogen exchange 
experiments. These characteristics of native stability underscore the fact that the 
operational definition of calorimetric two-state behavior (see above) does not^^ necessar- 
ily imply that all denatured conformations have the same stability. Even for calorimetri- 
cally two-state proteins under native conditions well below the global folding/unfolding 
transition midpoint, the population of partially unfolded conformations^^'^®'^^'''^^^ can 
sometimes be non-negligible as long as it does not exceed a certain threshold.^^'^^ 
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Figures 7-9 show that the transition midpoints determined by thermodynamics 
and kinetics are quite close, with only minor discrepancies. The discrepancies for NCSl 
models appear to be slightly larger in Figures 8 and 9. This is probably related to the 
high-free-energy minima in the transition regions of the corresponding NCSl free energy 
profiles (Figure 4b, c). More surprisingly, however, is that even with their native-centric 
potentials, all six models fail to produce the type of simple two-state folding/unfolding 
kinetics observed experimentally for CI2^^ and many other small single-domain proteins/ 
The operational definition^^'^^^ for simple two-state folding/unfolding kinetics requires 
that the logarithmic folding and unfolding rates under constant temperature be approxi- 
mately linear in native stability, and that the natural logarithm of the directly measured 
and linearly extrapolated (folding rate) / (unfolding rate) ratio as a function of denaturant 
concentration matches the directly measured and linearly extrapolated^^' equilibrium 
free energy of unfolding in units of ksT. Here, the dashed-dotted V-shapes in the upper 
panels of Figures 7-9 show that as —e/ksT is changed at constant T from the transi- 
tion midpoint towards either more native or more denaturing conditions, the respective 
trends of increase in simulated folding or unfolding rate fall short of this requirement 
for the kinetics to be simple two-state. Instead, our models' behavior is more akin to 
proteins that exhibit chevron rollovers, such as ribonuclease A^^^ and barnase,^^^ whose 
kinetics are operationally referred to as non-two-state.^^'^^'^^^'^^^ Comparisons with ex- 
perimental chevron plots have not been made in other studies of continuum Go models, 
but the reported results indicate that they also do not predict simple two-state chevron 
behavior (see, e.g.. Figure 2 of ref . 34). 

The four without-solvation and without-solvation-SSR models in Figures 7 and 8 
show a clear rollover in both the folding and unfolding arms of their chevron plots. Re- 
flecting the lower barriers along their free energy profiles (Figure 4), kinetics are generally 
faster for the without-solvation and without-solvation-SSR than the corresponding with- 
solvation models (Figure 9). For the with- solvation models, the rate at a given — c/A^bT 
is substantially slower for NCS2 than that for NCSl. This trend is consistent with 
NCS2's much higher free energy barrier in the transition region (Figure 4c). Most re- 
markably, comparing Figures 7, 8 against Figure 9 demonstrates a dramatic impact of 
desolvation barriers on the folding/unfolding kinetics. In contrast to the chevron plots 
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with significant curvatures for tlie witliout-solvation and witliout-solvation-SSR models, 
botli tlie folding and unfolding arms of the chevron plots are quasi-linear for the with- 
solvation models. It is reassuring that the with-solvation models are more proteinlike in 
this respect. ^^^'^^"^ Nevertheless, their deviations from simple two-state kinetics are huge: 
The slopes of the simulated chevron plots are only approximately 1/5 that required for 
simple two-state kinetics (c.f. the V-shape in the upper panel of Figure 9). Therefore, 
the conclusion that these models' kinetics correspond to those operationally referred to 
as non-two-state should be reliable. This is because possible numerical uncertainties in 
the estimation of stability curves by histogram techniques (lower panels of Figure 9) are 
not likely to cause a factor-of-five discrepancy. Interestingly, similar mismatches between 
extrapolated chevron plots and direct native stability measurements, albeit to a lesser 
degree, have also been observed for real proteins. 

Single-Exponential Relaxation 

Experimentally, kinetic relaxation of many small single-domain proteins^ such as 
Qj272,95 g^j^^ some apparently non-two-state proteins with chevron roUovers^^^'^^^ are 
found to be essentially single-exponential. Therefore, it is of interest to ascertain whether 
the present models embody this hallmark, even though they are not kinetically simple 
two-state. For this purpose, we examine the distributions of first passage times (FPTs, 
as defined in Figures 7-9). Let P{t)dt be the probability for the FPT of a given kinetic 
process to lie within a range dt around time t. If the relaxation is single-exponential, 

dt' P{t') = 1 - e-*^(*-*°) , (0.4) 



/ 

Jtn 



'to 

where k is the kinetic rate, and to > is a minimum FPT to take into consideration the 
finite time needed for pre-equilibration after initiation of the kinetic process at t — 0. It 
follows that 

MFPT = f dt' t'P{t') =to + \. (0.5) 

To assess whether a given FPT distribution conforms to this description, a quantity 
P(t)At is computed by binning FPTs into time slots^^^ of size At. If the kinetic process 
is single-exponential, 

\n[P(t)At] = l\n( — ) + ^5 1 , (0.6) 

^ ^' ^ \ VMFPT-to/ MFPT- to i MFPT - to 

i.e., ln[P(t)At] versus t should be a straight Une with slope = —(MFPT — to)~^. 
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Figure 10a shows that even under strongly native conditions concomitant with a 
significant chevron rollover, the NCS2 without-solvation-SSR model has approximately 
single-exponential relaxation. This behavior echoes that of a recent four-helix-bundle 
lattice model®^ (Figure 10b). Consistent with equation 6, a comparison between the 
filled and open circles in Figure 10a indicates that while changing the bin size At natu- 
rally changes the ln[P{t)At] values, reasonable variations in At do not affect the slope 
of the ln[P(t)At] distribution. Figure 11 applies similar analyses to folding and unfold- 
ing in other models in the present study under representative native and denaturing 
conditions. Owing to computational limitations, the sample sizes for the FPT dis- 
tributions are not very large, especially for the with-solvation models in Figure 11c. 
Consequently, a certain level of statistical uncertainties ensued. Nonetheless, Figure 11 
shows that for all cases tested, our data is consistent with single-exponential relax- 
ation. As pointed out by Fersht,^^ the high-free-energy minima along the NCSl free 
energy profiles (Figure 4b, c) do not preclude apparent single-exponential kinetics. The 
viability of equation 6 for our models is further buttressed by the relatively small dif- 
ferences between the slopes of the least-square- fitted lines in Figure 11 and the quantity 
— (MFPT — to) "^5 where to is taken to be the minimum FPT encountered in the sim- 
ulated trajectories of a given model: For the models and their simulation conditions 
listed in the legend of Figure 11, and in the same order, {[10^ x (MFPT — to)"^], [—10^ x 
slope]} = {10.0,10.6}, {6.13,6.71}, {6.11,6.53}, {7.28,7.56}, {28.6,36.1}, {6.41,7.02}, 
{19.5,26.4}, {8.53,9.36}, {2.05,2.08}, {0.504,0.431}, {0.242,0.189}, and {0.199,0.161}. 

The native-centric formulations in the present Go-like models lead to folding rates 
that are at least four orders of magnitdue faster than the experimental CI2 folding rates. 
At 25°C and pH 6.3, the experimental CI2 folding rates at zero denaturant (native sta- 
bility AG = 12.0/^6^) and the transition midpoint (in 3.92 M GdnHCl, native stability 
AG = 0) are, respectively, 47.8 sec~^ and 0.035 sec~^ (ref. 95). If we use the physical 
argument of Veitshans et al.^^ to identify the Langevin time unit 6t with a real time 
scale of ~ 10~^^ sec, the folding rate of the NCS2 with-solvation model in Figure 9 is 
~ 10^ sec~^ at AG = 12.0fcBT and ~ 10^ sec"^ at AG = 0. Corresponding folding 

^Rates in the chevron plots in Figures 7-9 are computed by taking to = 0. Our calculations 
indicate that using finite tos instead of to = to determine the rates k via equation 5 only leads to 
minimal modifications on the chevron plots (data not shown). The conclusions regarding rollovers and 
non-two-state kinetics remain unchanged. 
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rates of other models in Figures 7-9 are even faster by approximately two orders of 
magnitude. Despite these discrepancies, native-centric constructs do capture part of real 
protein energetics. This is evident from studies of extensive sets of real proteins using 
explicit-chain Go models, wherein theoretically predicted folding'^^ and relaxation^^ rates 
were found to correlate reasonably well with the experimental folding rates. However, it 
is noteworthy that the spread of these model-predicted rates among the set of proteins 
tested is apparently at least 1.5-2 orders of magnitude narrower than the diversity of 
experimental folding rates. (c.f. Figure 5 of ref. 44). This suggests that certain basic 
aspects of protein energetics are yet to be taken into account by common Go-like models. 
In a similar vein, the chevron rollovers in Figures 7 -9 represent a failure to account for 
the high degree of diversity in folding rates of a given protein under different native con- 
ditions. For real GI2, the folding rates at zero denaturant and at the transition midpoint 
differ by three orders of magnitude. But the Go-hke models in Figures 7 -9 predict only 
one order of magnitude difference. 

Chevron Rollover: Stability- Dependent Front Factor? 

To better understand the chevron rollovers. Figure 12 apphes a protocol we recently 
developed^^ to assess the models' conformity to the commonly employed transition state 
picture in interpreting protein folding experiments. Model data is now fitted to the 
expression 



rate = F(e, T) exp 



AG't(e, T) 



(0.7) 



for folding or unfolding rate, taken as (MFPT)~^ from the direct dynamics simulations. 
On the other side of the above equation, AG^ is an activation free energy determined 
soley by thermodynamic Boltzmann weights^^ using the method of Nymeyer et al.,^^'^^^ 
F is the corresponding front factor. Figure 12 shows that, in contrast to the 
usual stipulation^^^ that the front factors of small single-domain proteins such as GI2 
are essentially independent of intraprotein interaction strength and native stability, the 
F factors deduced from the present analysis are highly sensitive to e. This imphes that 
thermodynamic analyses of free energy profiles alone cannot predict the e-dependencies of 
the kinetic rates, ^^'^^'^^'^^^ and the chevron rollovers are underpinned by native-stability- 
dependent front factors in these models. This hypothesis regarding the physical origin 
of chevron rollover may soon be testable by single-molecule techniques. In addition 
to the definitions for unfolded, transition, and folded regions in Figure 12, we analyzed 
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several other physically reasonable alternate Q-based definitions for these states (data 
not shown). Whereas the absolute value of F varies somewhat, the overall trend of de- 
pendence on e remains essentially unchanged. This resilience is similar to that observed 
in our previous analysis of the folding front factor of a 55mer lattice model (Figure 5 of 
ref. 68). 

Thus, for this key aspect of chevron behavior, the present native-centric models' 
kinetics clearly do not resemble the simple two-state kinetics of CI2.^^'^^ The ramifica- 
tion of this finding is far reaching, as it bears on the basic energetics of protein folding 
(see Discussion below). As they stand, the apparently non- two-state kinetics of these 
physical self-contained polymer models^^ also shed light on the folding of other pro- 
teins that exhibit similar chevron rollovers as well.^^'^^^'^^^ To date, rationalizations of 
chevron rollovers include deadtime intermediates, specific kinetic traps,^^'^^^'^^^ peak- 
shifting on complex free energy profiles,^^'^^^ burst phase continuum, and internal 
friction as manifested by front factors that depend on native stability (ref. 68 and dis- 
cussion therein). These perspectives are not necessarily mutually exclusive. For exam- 
ple, internal friction may arise from kinetic trapping mechanisms (H. K. & H. S. C, in 
preparation). In any event, chevron rollover is an unequivocal prediction of the present 
models, irrespective of whether Q or other folding reaction coordinates are used for the 
transition state analysis (see, e.g., ref. 124). Figure 12d shows that the folding front fac- 
tor decreases with more native conditions, and the unfolding front factor also decreases 
with more denaturing conditions. In short, there appears to be an aversion to speed in 
these models' energetics. We tentatively attribute the slowing down in these models to a 
possible combination of effects of internal friction (conformational search problems com- 
pounded by more native conditions)^^ and external friction (implicit solvent viscosity). 
The origins of these effects remain to be better elucidated. For example, in some model- 
ing situations, folding- arm rollovers are related to the onset of downhill folding. ^^^'^^^ 
The chevron rollovers in the folding and unfolding arms of the NCS2 without-solvation 
model may be similarly related to downhill scenarios (see, e.g., the e = 0.90 and e = 0.70 
profiles in Figure 12a). At least for the NCS2 with-solvation model in Figure 6, the fact 
that no deadtime intermediate was observed during our simulation suggests that such a 
mechanism is not necessary for chevron rollovers. ^^'^^ In this example, chevron rollover 
emerges as a kinetic front- factor effect. 
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DISCUSSION 



We have compared two different native contact sets, and three different formulations 
of Go-like interactions with and without dcsolvation barriers. The predictions of these 
native-centric models were evaluated against generic thermodynamic and kinetic prop- 
erties of small single-domain proteins that these models were designed to mimic in the 
first place. We learnt several lessons. First, proteinlike thermodynamic cooperativity 
requires nonlocal contact-like interactions acting in concert with local conformational 
favorabilities for the native fold^^'^^'^^'^^ (Figure 5). Second, some basic predictions of 
native-centric models, such as the free energy profiles in Figure 4, are significantly de- 
pendent on the native contact set and interaction scheme used, even if the choice is 
made among physically reasonable definitions. A recent study^^'^^ shows also that free 
energy profiles of native-centric models are sensitive to the chain's presumed persistence 
length and energetic barriers to bond rotations. ^^'^'^^ Third, we found that pairwise des- 
olvation barriers in native-centric models could lead to some proteinlike properties such 
as a higher free energy barrier separating the native and denatured states (c.f. Fig- 
ure 4a, b and c) as well as more hnear chevron plots (c.f. Figures 7, 8 and 9). These 
predictions are encouraging as they provide insight into corresponding features in real 
proteins. Fundamentally, however, the kinetics of all present native-centric models for 
CI2 do not resemble that of real CI2. The models with pairwise dcsolvation barriers, 
like those without, are kinetically non-two-state in the operational sense that they have 
large chevron rollovers. 

Fourth, the significant differences between the predictions of with- and without- 
solvation models underscore the importance of proper accounting for the energetic cost 
of water expulsion in protein folding models, and that caution should be used when inter- 
preting results obtained from effective potentials that do not have dcsolvation barriers. 
The barrier height in the present with- solvation models simulated at T = 0.82 is 0.24 e ksT 
For real proteins, the dcsolvation barrier heights encountered by the polypeptide chain 
as a part of the potential of mean force are expected to be sensitive to temperature. 
Thus, the present results should also bear on explicit-solvent unfolding simulations at 
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high temperatures and the degree of dependencies of protein folding mechanisms on 
temperature. ^^'^ Of relevance here is the model system of a pair of methanes in water. 
Recent Monte Carlo simulations in the TIP4P water model indicate that their desolva- 
tion barrier is reduced from approximately 0.16 to 0.12 kcal/mol {0.27kBT to O.IG/cbT") 
when temperature is increased from 298K to 368K under atmospheric pressure. Un- 
der typical high-temperature unfolding conditions of 498K and a water density of 0.829 
gm/ml/^^ the desolvation barrier height is further reduced to ~ 0.05 kcal/mol or O.ObkBT 
(Figure 16.3 in ref. 18; S. Shimizu and H. S. C, personal communication). 

In a broader perspective, solvent-mediated interactions are known to be intrinsi- 
cally pairwise nonadditive/^'^^ and the collapse of a hydrophobic chain may involve 
large length-scale dynamic effects. In this light, that the pairwise desolvation barri- 
ers here fail to produce simple two-state chevron plots is not too surprising. Indeed, 
recent explicit-water simulations show that the sign of heat capacity of the free energy 
barrier against folding is opposite to that against the association of a pair of methane 
molecules. Considerations of a three-methane model system further indicate that 
the height of desolvation barrier is clearly nonadditive, and the sign and magnitude of 
this nonadditivty is dependent upon the configuration of the nonpolar solutes involved. 
Hence, solvation effects beyond the pairwise formulation considered here are likely needed 
to account for simple two-state protein folding/unfolding kinetics. 

In summary, the present findings imply that the actual solvent-mediated interactions 
in real proteins are much more specific and well-designed than one would otherwise posit. 
In short, real proteins are more cooperative than common Co-like models with pairwise 
additive interactions. Nonetheless, recent innovations in native-centric modeling have 
been immensely valuable. As discussed above, they do capture part of the essential 
physics. Many deep insight would not have been gained without them (see, e.g., refs. 1, 
29, 33). But, at the same time, the limitations of common Gb-like chain models^^'^^ may 
be more basic than previously appreciated. The present analysis implies that more pro- 
teinlike interaction schemes are yet to be discovered. Every model considered here except 
the contact-dominant variety can fold to the CI2 native structure. Qualitatively, the free 
energy profiles of the NCS2 models fit the expectation for that of small single-domain 
proteins as well. Yet their kinetics are fundamentally different from that of CI2. Thus, a 
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protein model's ability to fold to one single target structure does not guarantee the ade- 
quacy of its energetics; and the microscopic origin of simple two-state folding/unfolding 
kinetics remains to be elucidated. Our effort to address some of these questions is un- 
derway. Apparently, chevron rollovers can be essentially eliminated in more cooperative 
chain models with added energetic favorabilities for the ground-state and near-ground- 
state structures beyond that provided by the additive schemes in common Go models. 
These results will be presented in a subsequent report (H. K. & H. S. C, in prepara- 
tion). In the ongoing quest for a better understanding of protein energetics through 
the design and interpretation of novel physical models, proteinlike statistical mechanics 
properties such as calorimetric two-state cooperativity^^'^^'^^'^^'^^'^^°'^^^ and simple two- 
state chevron behavior^^ should be useful as stringent but necessary modeling constraints. 
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Table 1. Number of trajectories Nt used in the present study to determine the MFPT 
of folding and unfolding for the NCSl and NCS2 without-solvation models {T — 0.82, 
Figure 7). Each MFPT hsted is the average (arithmetic mean) over Nt first passage 
times for the given interaction strength e. Time is measured from the start of a given 
simulation ai t — in units of St (see text) . 
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Table 2. Same as Table 1, but for the without-solvation-SSR models {T — 0.64, 
Figure 8). 
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1519.4 


3 



Table 3. Same as Table 1, but for the with-solvation models {T — 0.82, Figure 9). 
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Figure Captions 



Figure 1. Native contact maps of tlie 64-residue truncated form of ciiymotrypsin 
inliibitor 2 (2ci2) used in tlie present investigation, (a) Contact maps for the native 
contact sets NCSl (green dots) and NCS2 (red dots) as defined in the text. Numbering 
of amino acids in these maps is initiahzed at residue 20 of the full-length 83-residue 
CI2; i.e., residue 1 in (a) corresponds to Leu 20 in the untruncated protein, (b, c, d) 
Similarities and differences between native contact maps. Contact pairs are indicated by 
color lines joining C^ positions along the backbone (black trace), (b) Contacts shared 
by NCSl and NCS2. (c) Contacts in NCSl but not in NCS2. (d) Contacts in NCS2 but 
not in NCSl. 

Figure 2. Different native contact definitions. Amino acid numbering here corre- 
sponds to that of the full-length CI2, i.e., numbering in this figure equals to that in 
Fig. la plus 19. The red contact belongs to NCS2 but not NCSl. This pair of residues 
has a Ca-Ca distance of 11.24 A, with closest atomic separation between the residues 
equals 4.3 A. The green contact belongs to NCSl but not NCS2. The Cq-C^ distance 
between this pair of residue is 5.36 A. 

Figure 3. (a) Model with-solvation interactions between two amino acid residues 
belonging to a given native contact pair in the present study (defined by either NCSl 
or NCS2); r is their C^-Ca separation. The potential energy U{r), shown here and in 
part (b) in units of e, depends also on the native Cq,-Cq distance r' of a given contact in 
the PDB structure. The r' values shown in this figure are only for illustrative purposes. 
They do not correspond to actual contacts in the present NCSl or NCS2 models (see 
text). U{r) here is defined by the functional form of Cheung et al.^° with k = 6, n = 2, 
m = 3, e' = 0.2e, and e" = O.le, where k, n, and m parametrize the functional form 
for r < r' , r' < r < and r > r'^ , respectively (e.g., the excluded volume repulsion 
~ r^^''', see equation on page 689 of ref. 40 for details), e' is the depth of water-separated 
minimum, and e" is the height of the desolvation peak. The two potential functions 
shown in (a) are for r' = 4.0 A (left) and r' = 6.5 A (right). The cartoons (for 
r' — 6.5 A) illustrate contact and water-separated minima configurations,^"'*^ where a 
water molecule is depicted as a solid circle of diameter pa 3 A. (b) With-solvation na- 
tive potential for r' = 3.8 A in the present study (as labeled) is compared with: (i) 
The r' — 3.8 A functional form of Cheung et al. with the same values for k, n, m, 
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and e as in (a), but with e' = e/3 and e" = 5e/9. (ii) The cxphct-watcr simulated 
methane-methane PMF at 25° C under atmospheric pressure obtained by Shimizu and 
Chan,^^ in a unit such that the free energy at contact equals — e = —1. (LJ) The 10-12 
Lennard- Jones without- solvation potential e[5{r' / — 6{r' /rY^] (as in equation 1) with 
r' = 3.8 A. (SSR) The corresponding LJ cutoff at 1.2r' in without-solvation-SSR models. 

Figure 4. Free energy profiles for NCSl (dashed curves) and NCS2 (solid curves), 
using native contact potentials without (a, b) and with (c) desolvation barriers, (a) is 
for without-solvation models that use the full spatial range of the LJ terms whereas (b) 
is for without-solvation-SSR models with LJ cutoffs. The variable Q is the number of 
native contacts in a conformation divided by the number of contacts in the native con- 
formation of the given model. P{Q) is the normalized population distribution over Q. 
The — In P{Q) profiles are computed at each model's approximate transition midpoint: 
(a) at e/ksT = 0.988 and 0.988 for the NCSl and NCS2 without-solvation models, (b) 
at e/ksT = 1.563 and 1.547 for the NCSl and NCS2 without-solvation-SSR models, and 
(c) at e/ksT = 1.165 and 1.098 for the NCSl and NCS2 with-solvation models. For 
without-solvation models in (a) and (b), the condition for contact is r < 1.2r', as in 
ref. 32. For with-solvation models in (c), a pair of residues is defined to be in contact 
when r < — {r' + r")/2, i.e., when the Cq-Cq distance r is within the contact basin 
(r not larger than that of the desolvation peak), as in ref. 40. 

Figure 5. Thermodynamic cooperativity. Heat capacity as a function of tempera- 
ture is shown for seven models: (i) the contact- dominant model described in the text, 
(n) NCSl and (ni) NCS2 without-solvation-SSR models, (iv) NCSl and (v) NCS2 with- 
solvation models, and (vi) NCSl and (vii) NCS2 without-solvation models. Vertical 
dotted lines correspond to the transition midpoints marked in Figures 7-9. The com- 
puted van't Hoff to calorimetric enthalpy ratios (defined in ref. 47) for these models with 
no baseline subtractions are, respectively, k,2 = 0.30, 0.57, 0.61, 0.56, 0.63, 0.46, and 
0.50. The corresponding ratios after subtracting the empirical baselines (shown in the 
figure) are = 0.33, 0.98, 1.00, 1.00, 1.01, 0.97, and 0.99. The unit of every heat 
capacity scan plotted is for interaction strength e — 1. Each scan was calculated from 
the density of states estimated by histogram techniques. ^^'^^ The sampling simulations 
were conducted at temperature T and e values chosen around each model's transition 
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midpoint to efficiently cover both the folded and unfolded regions of the conformational 
space. For (i) - (vii), T = 0.32, 0.64, 0.64, 0.82, 0.82, 0.82, and 0.82, respectively, and 
e = 1.0, 1.0, 1.0, 0.955, 0.90, 0.81, and 0.81 were used. 

Figure 6. Signatures of two-state thermodynamics and sharp kinetic transitions 
between states in the with-solvation NCS2 model at T = 0.82. Time evolution is moni- 
tored by snapshots taken at every 400St during the simulations, (a) A folding/unfolding 
trajectory near the transition midpoint of this model at e = 0.90 {—e/k-oT — —1.098, 
AGu/kBT — 0.68). (b) A trajectory showing transient unfolding under moderately na- 
tive conditions at e = 0.92 {-e/k^T = -1.122, AGjksT = 2.70). (c) Scatter plot of 
potential energy V^l^ versus kinetic energy of the model protein (sum of mv'^/2 over 
all Ca positions) for the trajectory in (a). Each dot represents a snapshot. The first 
11 snapshots are connected by line segments to highlight the initial pre-equilibration 
process. The average kinetic energy is equal to 78.9. (d) is the corresponding scatter 
plot of potential energy versus Q for the trajectory in (a) and (c). 

Figure 7. Folding/unfolding kinetics and thermodynamics of the without-solvation 
NCSl (squares, dashed curves) and NCS2 (circles, solid curves) models at T = 0.82. Up- 
per panel: Chevron plots of negative natural logarithm of folding (filled symbols) and 
unfolding (open symbols) MFPT data from Table 1. The curves are guides for the eye. 
Unfolding simulations start with the native conformation; FPT is defined by the chain 
having < 25 native contacts. Folding simulations start with randomly generated open 
conformations with Q ^ 10%, FPT is defined by the chain achieving a Q value larger 
than or equal to that of the native free energy minima on the free energy profiles in Fig- 
ure 4A, i.e., Q > 112/137 for NCSl and Q > 120/142 for NCS2. Lower panel: The free 
energy of unfolding AG^ in units of ksT for each of the two models (dashed lines: NCSl, 
solid lines: NCS2) is the natural logarithm of the Boltzmann weight (population) of the 
folded state minus that of the denatured chain population with < 35 native contacts 
(upper curves) or that with < 70 native contacts (lower curves). Conformations with 
> 100 out of 137, and > 105 out of 142 native contacts (corresponding approximately 
to Q > 0.73 around the native minima in Figure 4A) are taken to be the folded states, 
respectively, of the NCSl and NCS2 models. Stabihty curves shown are obtained by 
histogram techniques from simulations at e = 0.80 and 0.81 for NCSl, and at e = 0.80, 
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0.81, and 0.82 for NCS2. The vertical dotted line marks the midpoint —e/k^T values at 
which AGu = for the two models. The V-shaped dashed-dot lines in the upper panel is 
an hypothetical simple two-state chevron plot that would be consistent with the models' 
approximately linear thermodynamic stability curves in the lower panel. Note that the 
qusai-linear stability curves of the two models have approximately the same slope. 

Figure 8. Same as Figure 7 but for the without-solvation-SSR models at T = 0.64. 
Simulation details not identical to that in Figure 7 are as follows. Upper panel: MFPTs 
are from Table 2. Here folding FPT is defined by the chain achieving Q — 1. Lower 
panel: Stability curves are given by the natural logarithm of the Boltzmann weight 
(population) of the folded state minus that of the denatured chain population with < 35 
native contacts (upper curves) or that with < 80 native contacts (lower curves) . Confor- 
mations with 132 out of 137, and 137 out of 142 native contacts (corresponding to the 
native minima on the free energy profiles in Figure 4B) are taken to be the folded states, 
respectively, of the NCSl and NCS2 models. The AG^/kBT stability curves remain es- 
sentially unchanged if the thermodynamic definitions for the folded states of these models 
are extended to Q > 132/137 for NCSl and Q > 137/142 for NCS2. Stability curves 
shown are obtained by histogram techniques from simulations at e = 0.97, 0.98, 0.99, 
and 1.00 for NCSl, and at e = 0.99, 1.00, and 1.01 as well as confirmed by simulations 
at several temperatures other than T — 0.64 for NCS2. 

Figure 9. Same as Figure 8 but for the with-solvation models at T = 0.82. Simula- 
tion details not identical to that in Figure 8 are as follows. Upper panel: MFPTs are 
from the 0.6 < e < 1.30 entries in Table 3. Otherwise the kinetic definitions of folding 
and unfolding are the same as that in Figure 8. Lower panel: Stability curves are given 
by the natural logarithm of the Boltzmann weight of the folded state minus that of the 
denatured chain population with < 30 native contacts (upper curves) or that with < 80 
(lower curves). The folded state is defined here by conformations with exactly Q = 1 (c.f. 
Figure 4C). The stability curves are obtained by histogram techniques from simulations 
at e = 0.955 and 0.96 for NCSl, and at e = 0.90 for NCS2. 

Figure 10. Approximate single exponential folding kinetics indicated by first pas- 
sage time (FPT) distributions. P{t)At is the fraction of trajectories with t — At/2 < 
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FPT < t + At/2, (a) The FPT distribution among the 1,097 folding trajectories un- 
der strongly native conditions at —e/k^T = —1.80 in the NCS2 without-solvation-SSR 
model (e = 1.15 entry in Table 2) is shown for bin sizes At = 10^ (filled circles) and 
At = 2 X 10^ (open circles). The solid line is the least-square fit through the At = 10^ 
data points, (b) Included for comparison is the FPT distribution among the 1,080 folding 
trajectories of the three-dimensional lattice model described on page 903 of Kaya and 
Chan^^ with e/k^T = —1.72 and At = 10^; t is the number of Monte Carlo time steps. 
The solid line is the least-square fit (correlation coefficient r — 0.95) through the data 
points shown in the main figure; 20 trajectories that give rise to a long-FPT tail in the 
full distribution (inset) are excluded from this fit. The dashed line is equation 6 with 
In(MFPT) = 16.25 from ref. 68; to was taken to be zero for this lattice case. 

Figure 11. Approximate single-exponential folding and unfolding kinetics. FPT 
distributions are presented as in Figure 10 A. Sofid lines are least-square fits through the 
data points shown. Numbers of trajectories in the distributions are given in Tables 1-3. 
Unfolding and folding data using NCSl (or NCS2) are plotted, respectively, by open 
and filled squares (or circles), (a) Without-solvation models, (b) Without-solvation-SSR 
models. The NCS2 folding plot here is identical to the At = 10^ case in Figure lOA. 
(c) With-solvation models. The e values for NCSl unfolding, folding, NCS2 unfolding, 
folding, and the corresponding At bin sizes for these different models are, respectively, 
(a) e = 0.77, 0.88, 0.77, 0.88, At/lO^ = 0.84, 1.4, 1.5, 1.2; (b) e = 0.85, 1.15, 0.85, 1.15, 
At/lO^ = 0.21, 1.3, 0.4, 1.0; (c) e = 0.70, 1.10, 0.70, 1.10, At/10<^ = 0.45, 2.0, 4.6, 5.5. 

Figure 12. Front factor analyses, (a - c): Free energy profiles for the NCS2 without- 
solvation (a), without-solvation-SSR (b) and with-solvation (c) models at the e values in- 
dicated (c.f. Figure 4). These plots are obtained from Boltzmann distributions computed 
by histogram techniques. Where appropriate, different line styles are used for profiles 
at different e values for clarity. Shaded areas are examples of unfolded, transition, and 
folded state regions considered (see text), (d) Correlations between rates and activation 
free energies deduced from free energy profiles are analyzed as in ref. 68. The vertical 
variable is given by InF = - In(MFPT) + AGV^b^ (equation 7). In the present analy- 
sis, AG^{e,T)/kBT = -ln[P(60/142 <Q < 80/142)/P(Q > 105/142)] (open triangles, 
without-solvation unfolding), -ln[P(60/142 < Q < 80/142)/P(g < 35/142)] (filled 
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triangles, without-solvation folding), -ln[P(75/142 <Q< 95/142)/P(Q = 137/142)] 
(open squares, without-solvation-SSR unfolding), -ln[P(75/142 <Q < 95/142)/P(Q < 
25/142)] (filled squares, without-solvation-SSR folding), -ln[P(70/142 <Q< 95/142)/P(g = 
1)] (open circles, with-solvation unfolding), and -ln[P(70/142 <Q< 95/142)/P(Q < 
25/142)] (filled circles, with-solvation folding). The middle shaded regions in (a - c) 
correspond to the transition-state regions used in the analysis in (d). NCSl models have 
similar trends (data not shown) . 
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